library(tidyverse)
library(plotly)
theme_set(theme_minimal())
shot_data <- readxl::read_excel("Sample Player Test Data.xlsx")
shot_data <- shot_data %>%
mutate_at(c("Player", "Club"), factor)
Assuming that the goal of any club test is to find better accuracy, more distance, or some combination of both. I’m choosing to start by looking at accuracy.
shot_data %>%
select(Player, Club, `Actual Offline`) %>%
group_by(Club) %>%
summarise(Avg_Miss = mean(`Actual Offline`, na.rm = TRUE))
## # A tibble: 2 x 2
## Club Avg_Miss
## <fct> <dbl>
## 1 Club A 6.76
## 2 Club B -0.147
It looks like there might be one club that is much closer to the target than the other. Next, I want to see what the distribution of this offline data looks like.
pl <- ggplot(shot_data, aes(`Actual Offline`)) +
geom_density() +
geom_vline(xintercept = mean(shot_data$`Actual Offline`), color = 'red', alpha = 0.34) +
scale_x_continuous(limits = c(-70, 70))
ggplotly(pl)
Looks more or less normal which means that I can use a T-Test to verify what the simple means above are hinting at. First I’ll check that the distribution is normal using a Shapiro-Wilk test.
shapiro.test(shot_data$`Actual Offline`)
##
## Shapiro-Wilk normality test
##
## data: shot_data$`Actual Offline`
## W = 0.99707, p-value = 0.6948
t_test <- t.test(shot_data$`Actual Offline`~shot_data$Club)
t_test
##
## Welch Two Sample t-test
##
## data: shot_data$`Actual Offline` by shot_data$Club
## t = 3.9666, df = 397.99, p-value = 8.647e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.484323 10.332125
## sample estimates:
## mean in group Club A mean in group Club B
## 6.7615757 -0.1466481
Cool, so it looks like Club B is up closer to the target on average for most of the participants. I like to see results like this visually so I will now make a couple of visuals to help enforce this result.
p <- shot_data %>%
group_by(Player, Club) %>%
summarise(Avg_Miss = mean(`Actual Offline`)) %>%
ungroup() %>%
ggplot(aes(Club, Avg_Miss, col = Club)) +
geom_boxplot() +
coord_flip()
ggplotly(p)
offline <- shot_data %>%
group_by(Player, Club) %>%
summarise(Avg_Miss = mean(`Actual Offline`)) %>%
ungroup()
plot <- ggplot(offline, aes(Club, Avg_Miss, group = Player)) +
geom_point(aes(text = paste("Avg Miss: ", Avg_Miss, ", Player: ", Player), color = Player, size = 3, alpha = 0.5)) +
geom_line(aes(color = Player, alpha = 0.5)) +
coord_flip() +
theme(legend.position = "none")
ggplotly(plot, tooltip = c("text"))
Pretty clear that Club B is biasing shots more left than Club A. There could be some left-handed players which might explain some of the players going farther right than they started. I want to see if there was any effect on the top ten players with the most significant accuracy gains as far as distance is concerned.
diff_df <- shot_data %>%
group_by(Player, Club) %>%
summarise(Avg_Miss = mean(`Actual Offline`)) %>%
spread(Club, Avg_Miss) %>%
mutate(Diff_Miss = `Club A` - `Club B`) %>%
arrange(desc(abs(Diff_Miss)))
dva <- shot_data %>%
right_join(diff_df) %>%
select(Player, Club, `Actual Carry`, Diff_Miss) %>%
group_by(Player, Club, Diff_Miss) %>%
summarise(Avg_carry = mean(`Actual Carry`)) %>%
spread(Club, Avg_carry) %>%
mutate(Diff_Carry = `Club A` - `Club B`) %>%
select(Player, Diff_Miss, Diff_Carry) %>%
arrange(desc(abs(Diff_Carry)))
## Joining, by = "Player"
Interestingly it seems that the players with the biggest accuracy gains had the smallest change in carry yards in the group.
ggplot(dva, aes(Diff_Miss, Diff_Carry)) +
geom_point(aes(col = Player, size = 3, alpha = 0.34)) +
geom_smooth(method = lm, color = "FireBrick") +
theme(legend.position = 'none')
A quick plot shows a general downward trend suggesting that there is a negative correlation in this case between accuracy gained and distance delta. However the confidence interval surrounding the line is massive, so I’m not sure I could make a definitive statement here. What does seem to be clear is that Club B averages about 7 yards closer to the target than Club A.